Overview

Dataset statistics

Number of variables6
Number of observations839
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory39.5 KiB
Average record size in memory48.2 B

Variable types

Numeric6

Warnings

Houses - median sale price ($) is highly correlated with INCOME_8High correlation
INCOME_17 is highly correlated with INCOME_2High correlation
INCOME_2 is highly correlated with INCOME_17High correlation
INCOME_8 is highly correlated with Houses - median sale price ($)High correlation
Houses - median sale price ($) is highly correlated with INCOME_17 and 1 other fieldsHigh correlation
INCOME_17 is highly correlated with Houses - median sale price ($) and 1 other fieldsHigh correlation
INCOME_2 is highly correlated with Houses - median sale price ($) and 1 other fieldsHigh correlation
INCOME_17 is highly correlated with INCOME_2High correlation
INCOME_2 is highly correlated with INCOME_17High correlation
INCOME_8 is highly correlated with Houses - median sale price ($) and 1 other fieldsHigh correlation
Houses - median sale price ($) is highly correlated with INCOME_8 and 1 other fieldsHigh correlation
INCOME_17 is highly correlated with INCOME_8 and 1 other fieldsHigh correlation
INCOME_2 is highly correlated with Houses - median sale price ($) and 1 other fieldsHigh correlation
Houses - median sale price ($) has 68 (8.1%) zeros Zeros
INCOME_11 has 57 (6.8%) zeros Zeros
INCOME_8 has 46 (5.5%) zeros Zeros
INCOME_5 has 43 (5.1%) zeros Zeros

Reproduction

Analysis started2021-08-18 06:15:16.494892
Analysis finished2021-08-18 06:15:24.470179
Duration7.98 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

Houses - median sale price ($)
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct489
Distinct (%)58.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.387569658 × 10-17
Minimum-1.456628867
Maximum9.478944059
Zeros68
Zeros (%)8.1%
Negative487
Negative (%)58.0%
Memory size6.7 KiB

Quantile statistics

Minimum-1.456628867
5-th percentile-1.001289798
Q1-0.587273697
median-0.158315554
Q30.2062988675
95-th percentile1.862077299
Maximum9.478944059
Range10.93557293
Interquartile range (IQR)0.7935725645

Descriptive statistics

Standard deviation1.000596481
Coefficient of variation (CV)2.953729611 × 1016
Kurtosis15.79000838
Mean3.387569658 × 10-17
Median Absolute Deviation (MAD)0.411084887
Skewness2.993097448
Sum2.842170943 × 10-14
Variance1.001193317
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
068
 
8.1%
-0.44428764939
 
1.1%
-0.53007927799
 
1.1%
-0.5872736978
 
1.0%
-0.37279462558
 
1.0%
-0.029628111128
 
1.0%
-0.32989881128
 
1.0%
-0.51578067327
 
0.8%
-0.24410718267
 
0.8%
0.12765654136
 
0.7%
Other values (479)701
83.6%
ValueCountFrequency (%)
-1.4566288671
0.1%
-1.3951448661
0.1%
-1.3622580751
0.1%
-1.2736067261
0.1%
-1.2450095161
0.1%
-1.2164123072
0.2%
-1.2021137021
0.1%
-1.1878150971
0.1%
-1.1735164921
0.1%
-1.1592178881
0.1%
ValueCountFrequency (%)
9.4789440591
0.1%
6.4762370581
0.1%
5.6183207721
0.1%
5.1893626291
0.1%
4.9391370451
0.1%
4.5173282051
0.1%
4.2170575051
0.1%
4.1884602951
0.1%
3.8309951761
0.1%
3.7023077331
0.1%

INCOME_17
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct708
Distinct (%)84.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.911976004 × 10-16
Minimum-2.508099021
Maximum6.935083182
Zeros0
Zeros (%)0.0%
Negative408
Negative (%)48.6%
Memory size6.7 KiB

Quantile statistics

Minimum-2.508099021
5-th percentile-1.469353797
Q1-0.6125112686
median8.765589081 × 10-16
Q30.4419316309
95-th percentile1.860469056
Maximum6.935083182
Range9.443182203
Interquartile range (IQR)1.0544429

Descriptive statistics

Standard deviation1.000596481
Coefficient of variation (CV)2.037054904 × 1015
Kurtosis3.622916504
Mean4.911976004 × 10-16
Median Absolute Deviation (MAD)0.5489119559
Skewness1.035654398
Sum4.121147867 × 10-13
Variance1.001193317
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.765589081 × 10-16118
 
14.1%
-1.6776762952
 
0.2%
0.30808575132
 
0.2%
-0.66606166782
 
0.2%
1.767499782
 
0.2%
-0.79484766712
 
0.2%
1.0569480262
 
0.2%
0.38458632162
 
0.2%
-0.38294932112
 
0.2%
0.10629290852
 
0.2%
Other values (698)703
83.8%
ValueCountFrequency (%)
-2.5080990211
0.1%
-2.2982344641
0.1%
-2.2547435891
0.1%
-2.2128188671
0.1%
-2.21221651
0.1%
-2.1922179261
0.1%
-2.1095732161
0.1%
-2.0658413931
0.1%
-2.0429514591
0.1%
-2.0423490921
0.1%
ValueCountFrequency (%)
6.9350831821
0.1%
4.0987793631
0.1%
3.9652949031
0.1%
3.6111032861
0.1%
3.3793125821
0.1%
3.3244972131
0.1%
3.2858252711
0.1%
3.2749826711
0.1%
3.1755921661
0.1%
2.9571740021
0.1%

INCOME_11
Real number (ℝ)

ZEROS

Distinct731
Distinct (%)87.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.016270897 × 10-16
Minimum-1.769952435
Maximum4.097018625
Zeros57
Zeros (%)6.8%
Negative418
Negative (%)49.8%
Memory size6.7 KiB

Quantile statistics

Minimum-1.769952435
5-th percentile-1.406502766
Q1-0.6603988005
median0
Q30.4519466962
95-th percentile2.013562008
Maximum4.097018625
Range5.86697106
Interquartile range (IQR)1.112345497

Descriptive statistics

Standard deviation1.000596481
Coefficient of variation (CV)9.84576537 × 1015
Kurtosis1.774099833
Mean1.016270897 × 10-16
Median Absolute Deviation (MAD)0.5569307033
Skewness1.02821267
Sum8.526512829 × 10-14
Variance1.001193317
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
057
 
6.8%
-1.40650276635
 
4.2%
-1.4066719693
 
0.4%
-1.3966889482
 
0.2%
-0.3560858442
 
0.2%
0.75084513242
 
0.2%
0.071999660452
 
0.2%
0.28553480142
 
0.2%
0.28485798642
 
0.2%
0.21430001982
 
0.2%
Other values (721)730
87.0%
ValueCountFrequency (%)
-1.7699524351
0.1%
-1.7061626191
0.1%
-1.6799360371
0.1%
-1.614792591
0.1%
-1.6061631991
0.1%
-1.5493107361
0.1%
-1.4650472661
0.1%
-1.4466040561
0.1%
-1.4334061631
0.1%
-1.4261304021
0.1%
ValueCountFrequency (%)
4.0970186251
0.1%
3.9555642851
0.1%
3.8687627571
0.1%
3.7399986991
0.1%
3.4408464571
0.1%
3.4112357991
0.1%
3.3897469221
0.1%
3.3753646031
0.1%
3.3202041781
0.1%
3.2985460971
0.1%

INCOME_2
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct737
Distinct (%)87.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-5.166043728 × 10-16
Minimum-2.821009014
Maximum7.127331837
Zeros0
Zeros (%)0.0%
Negative502
Negative (%)59.8%
Memory size6.7 KiB

Quantile statistics

Minimum-2.821009014
5-th percentile-1.474022022
Q1-0.6328614945
median-8.822869829 × 10-16
Q30.5060786872
95-th percentile1.777180796
Maximum7.127331837
Range9.948340851
Interquartile range (IQR)1.138940182

Descriptive statistics

Standard deviation1.000596481
Coefficient of variation (CV)-1.936871876 × 1015
Kurtosis3.776971607
Mean-5.166043728 × 10-16
Median Absolute Deviation (MAD)0.5881114828
Skewness0.9307780909
Sum-4.334310688 × 10-13
Variance1.001193317
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-8.822869829 × 10-1689
 
10.6%
0.58811148284
 
0.5%
-0.26071271432
 
0.2%
-0.54191604482
 
0.2%
-0.016130084962
 
0.2%
-0.78019312292
 
0.2%
-0.3059429182
 
0.2%
-0.073971390962
 
0.2%
1.2875426212
 
0.2%
1.3023364142
 
0.2%
Other values (727)730
87.0%
ValueCountFrequency (%)
-2.8210090141
0.1%
-2.5458687141
0.1%
-2.427760891
0.1%
-2.3091680231
0.1%
-2.2952230541
0.1%
-2.2060965131
0.1%
-2.1313999841
0.1%
-2.0649491761
0.1%
-2.0626452241
0.1%
-2.0543995031
0.1%
ValueCountFrequency (%)
7.1273318371
0.1%
4.387569851
0.1%
3.8313474791
0.1%
3.4838146011
0.1%
3.3978408361
0.1%
3.3185364041
0.1%
2.8593225131
0.1%
2.8357979571
0.1%
2.7152649211
0.1%
2.6588787421
0.1%

INCOME_8
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct485
Distinct (%)57.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-4.234462072 × 10-17
Minimum-0.6782751478
Maximum13.35409414
Zeros46
Zeros (%)5.5%
Negative539
Negative (%)64.2%
Memory size6.7 KiB

Quantile statistics

Minimum-0.6782751478
5-th percentile-0.5913414091
Q1-0.4229393142
median-0.2103017805
Q30.07492286304
95-th percentile1.08465226
Maximum13.35409414
Range14.03236929
Interquartile range (IQR)0.4978621772

Descriptive statistics

Standard deviation1.000596481
Coefficient of variation (CV)-2.362983689 × 1016
Kurtosis85.35501131
Mean-4.234462072 × 10-17
Median Absolute Deviation (MAD)0.2356946156
Skewness7.855874294
Sum-3.552713679 × 10-14
Variance1.001193317
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
046
 
5.5%
-0.31277770046
 
0.7%
-0.5126057445
 
0.6%
-0.32302529235
 
0.6%
-0.43745673625
 
0.6%
-0.37597118435
 
0.6%
-0.36743152435
 
0.6%
-0.48869469615
 
0.6%
-0.43404087224
 
0.5%
-0.39134257224
 
0.5%
Other values (475)749
89.3%
ValueCountFrequency (%)
-0.67827514781
 
0.1%
-0.66119582782
0.2%
-0.65777996381
 
0.1%
-0.65094823581
 
0.1%
-0.64240857581
 
0.1%
-0.63386891593
0.4%
-0.63045305191
 
0.1%
-0.62874511992
0.2%
-0.62703718791
 
0.1%
-0.62362132391
 
0.1%
ValueCountFrequency (%)
13.354094141
0.1%
12.72215931
0.1%
9.8442938881
0.1%
7.4924715271
0.1%
5.5061466151
0.1%
4.6880471881
0.1%
4.6607202761
0.1%
4.2815593721
0.1%
4.0885630571
0.1%
3.4429647621
0.1%

INCOME_5
Real number (ℝ)

ZEROS

Distinct770
Distinct (%)91.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-3.387569658 × 10-17
Minimum-3.032266851
Maximum7.593931862
Zeros43
Zeros (%)5.1%
Negative394
Negative (%)47.0%
Memory size6.7 KiB

Quantile statistics

Minimum-3.032266851
5-th percentile-1.55619789
Q1-0.5058618496
median0
Q30.4648576138
95-th percentile1.406602762
Maximum7.593931862
Range10.62619871
Interquartile range (IQR)0.9707194635

Descriptive statistics

Standard deviation1.000596481
Coefficient of variation (CV)-2.953729611 × 1016
Kurtosis9.692946691
Mean-3.387569658 × 10-17
Median Absolute Deviation (MAD)0.4887174713
Skewness1.506883529
Sum-2.842170943 × 10-14
Variance1.001193317
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
043
 
5.1%
0.1542228192
 
0.2%
-0.3583645492
 
0.2%
-0.070447991742
 
0.2%
0.29144553992
 
0.2%
0.50835333482
 
0.2%
-0.71957310872
 
0.2%
0.68256454282
 
0.2%
-0.23758115592
 
0.2%
0.10604645622
 
0.2%
Other values (760)778
92.7%
ValueCountFrequency (%)
-3.0322668511
0.1%
-2.8427579351
0.1%
-2.5982229371
0.1%
-2.5536997581
0.1%
-2.5415985861
0.1%
-2.4687632321
0.1%
-2.3998093861
0.1%
-2.3712688861
0.1%
-2.3577977711
0.1%
-2.3299422431
0.1%
ValueCountFrequency (%)
7.5939318621
0.1%
6.6119103611
0.1%
5.8792186621
0.1%
5.0956107121
0.1%
5.0497175891
0.1%
4.764540921
0.1%
4.2309477441
0.1%
3.8167680181
0.1%
3.7466725521
0.1%
2.5607577231
0.1%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

Houses - median sale price ($)INCOME_17INCOME_11INCOME_2INCOME_8INCOME_5
00.0000002.4216702.0582832.458071e+00-0.391343-0.067023
10.000000-0.6272690.684010-6.750602e-010.039056-0.299229
20.000000-1.1544611.180284-1.554685e+00-0.153940-0.493989
3-0.444288-1.5750331.163702-1.825338e+000.0000000.696036
4-0.575835-0.8941180.675211-9.194003e-010.0305170.127966
5-0.494333-1.5709371.586373-1.149917e+000.025393-1.071649
6-0.620160-1.3328820.677411-1.499996e+000.1808150.023165
7-0.391383-1.5246751.460486-1.553472e+000.132993-0.259500
8-0.258406-1.3275811.361501-1.161315e+000.0000000.465657
9-0.465021-2.0423491.572668-8.822870e-160.592426-0.289639

Last rows

Houses - median sale price ($)INCOME_17INCOME_11INCOME_2INCOME_8INCOME_5
8290.4136291.9902552.8066712.859323e+000.223513-0.395810
8300.1748421.2935582.6303601.610823e+00-0.005350-1.061831
8310.5496802.1375942.972660-8.822870e-160.575347-1.189464
8320.5852121.6299190.0000001.940773e+000.563391-0.017933
8331.0999621.3753593.2985461.961630e+001.2960940.000000
8340.2348960.5531292.5816308.590077e-01-0.046340-1.872381
8352.2224023.6111033.739999-8.822870e-164.6880475.879219
8360.4422261.6304010.0000002.085195e+000.435297-0.636007
8370.0000002.3611921.7051542.615225e+00-0.328149-1.855028
8380.0000003.2858251.5880653.483815e+00-0.435749-1.004293